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Introduction 


Rates  of  convergence  for  stochastic  approximation  problems  were  given  in 
[ lt  2,  3,  4],  the  latter  two  references  getting  better  results  via  weak  conver- 
gence methods,  for  both  constrained  and  unconstrained  systems. 

A form  of  stochastic  approximation  algorithm  which  is  of  increasing  impor- 
tance is  the  following.  Let  {a^}  denote  a sequence  of  positive  real  numbers  with 

7 a = ®,  h a suitable  function  and  {£  } a sequence  of  random  variables.  Define 
**n  n n 

the  sequence  of  R -valued  random  variables  {Xn>  by 


(1.1) 


X _ = X t a h(X  ). 
n+1  n n n’  n 


In  [l  ] - [4  ],  the  function  h was  essentially  additive  in  Cn»  as  is  usually  the 

case  in  classical  Kiefer-Wolfowitz  and  Robbins-Munro  type  stochastic  approximation 

algorithms.  Of  course,  if  is  a sequence  of  independent  random  variables, 

then  h(X  ,£  ) can  be  written  in  the  form  E[h(X  ,£  )lx  ] + , where  i(i  = h(X  , £ ) - 

n’  n n*  n 1 n n’  rn  n*  n 

E[h(Xn,Cn> | X^]  is  a member  of  an  orthogonal  sequence,  and  we  are  back  to  the 
classical  case.  In  the  applications  that  we  have  in  mind  the  { Cn J can  be  rather 
general  processes. 

The  more  general  form  (1.1)  arises  in  applications  to  problems  in  the  recur- 
sive identification  of  the  parameters  of  linear  systems,  or  in  the  so-called  self- 
tuning regulators  or  in  other  applications  of  adaptive  systems  [5,  6].  Such  applica- 
tions are  the  motivation  for  this  work.  Often  X is  an  estimate  of  the  vector  system 

n 

parameter  and  £n  is  a random  vector  which  is  related  to  the  measured  inputs  and 
outputs  of  the  system.  The  rate  of  convergence  problem  for  such  situations  has  not 
been  dealt  with,  and  somewhat  different  methods  are  required. 

In  this  paper  we  develop  rate  of  convergence  results  for  (1.1)  under  quite 
reasonable  conditions,  owing  to  the  way  in  which  (1.1)  arises  in  applications, 
the  (£n)  is  rarely  a sequence  of  independent  random  variables,  and 
E(h(Xn,£n)  | ^)  is  rarely  a function  only  of  Xn_^.  Thus  classical 


2. 


rate  of  convergence  methods  (as  in  [1  ],  [2])  cannot  be  used  directly.  We  use 
some  of  the  ideas  in  (3],  [4],  but  adapted  to  our  case,  and  under  weaker 


conditions  on  the  noise  sequences. 

The  problem  is  formulated  and  some  assumptions  given  in  Section  2.  Weak 
convergence  of  a sequence  of  normalized  {X^}  is  given  in  Section  3,  and  the  general 
rate  result  appears  in  Section  4. 


2.  Terminology  and  Problem  Formulation 
For  a £ (0,1]  and  A a matrix,  set  a^  = A/(n+l)a.  Since  we  are  concerned 
with  rates  of  convergence,  we  assume  convergence  (see  [ 4 ] for  a detailed  discus- 
sion of  the  convergence  both  w.p.  1 and  weakly).  In  particular,  we  suppose  that 


there  is  a 0 6 R1  such  that  X ■*  0 w.p.  1.  Set  U = (n+1)  (X  -0),  At  = 

n v n n n 

(n+l)-a,  h = h(0,C  ) and  h = (n+2/n+l)a/,2h  . Let  h(*,£)  be  continuously  dif- 
n n n n 

ferentiable  for  each  £,  with  the  gradient  hx(*,*)  being  B ore 1- measurable . 

There  is  a function  0( • ) such  that  with  H defined  by  (2.1),  (2.2)  holds. 

n 

(See  [3],  eqn . (5.2)  for  a related  calculation  for  the  case  where  h is  additive 
in  £. ) 


(2.1) 


H = Ah  (0,£  ) * — I * 0(-L-)I 

n * " sditi)1-”  ”*1 


‘£f>a/2  j[h„(9.t(xn-e),tn)-hx(9,£n)]dt 


, A[(S«)°'2  - 1]  hx(9,£n) 


1 


(2.2)* 


U ..  = ( I + At  H )U  + Ai/At“  h 
n+1  n n n n n 


For  future  use  define  6W  = /At  h , 6W  = /At  h . 

n n n n n n 

Lemmas  1 and  2 contain  some  preparatory  results  concerning  the  iteration 
(2.2),  and  tightness  of  {U^}  (i.e.,  sup^  i’(  | | >_  N ) -►  0 as  N -*•  «)  is  proved  in 
Theorem  1. 

Next,  following  the  general  approach  of  [ 3],  a sequence  of  processes 

jl  n_  ^ 

{U  (*)}  is  defined  as  follows.  Let  tR  = £._Q  At^,  tQ  = 0 and  define  m(t)  = 
max{k:  tk<t}.  Set  UN(0)  = UN  and  UN(t)  = UN+n  in  [t^  .tjj  +1).  Thus  UN(*)  is 
a process  whose  paths  are  piecewise  constant  and  in  D [0,<»),  the  space  of  R -valued 
functions  which  are  right  continuous  on  (0,“)  and  have  left-hand  limits  on  (0,°°). 
Since  it  will  be  important  for  us  to  go  back  and  forth  between  the  {U^}  and  {U  (•)) 
sequences,  the  functions  m(«)  and  tn  will  be  used  quite  frequently,  occasionally 
(and  regrettably)  causing  some  complicated  notation. 

Owing  to  the  scale  factor  an  = AAt^,  the  interpolation  U (*)  is  quite  natural 
for  this  problem.  In  Theorem  2 it  will  be  shown  that  (ir(*)}  is  tight  in  Dr[0,<*>) 
and  converges  weakly  to  the  stationary  linear  Gaussian  diffusion  (4.1).  As  is 
common  in  applications  of  weak  convergence  theory,  if  a sequence  of  measures 

p jo 

{un}  is  tight  and  converges  weakly  to  u (all  on  R or  D [0,»)),  and  and  u are 
induced  by  processes  Xn( • ) and  X(‘),  reap,  (with  paths  in  Rr  or  Dr[0,°°)),  then  we 
abuse  terminology  and  say  that  {Xn}  is  tight  and  converges  weakly  to  X.  This  weak 
convergence  gives  us  the  basic  rate  of  convergence  result.  Some  advantages  of  our 
approach  are  discussed  in  [3].  it  yields  the  convergence  in  distribution  (to  a 
normally  distributed  random  variable,  the  stationary  distribution  of  (4.1))  of 


*From  (2.1)  we  can  guess  that  if  a = 1 (resp.  a < 1)  the  "effective"  component  of 

H is  (Ah  (6,£  ) + 1/2)  (Ah  (0,£  ),  resp.). 

^ x n x n 


4. 


{Un),  but  also  more,  since  it  gives  information  on  the  correlation  structure  of 
the  process  {Ufj  , n^O}  for  large  N. 

Remark  on  weak  convergence.  Billingsley  [ 7 ] is  the  most  comprehensive 
reference.  The  space  D[0,T]  is  discussed  in  [7  ],  Sections  14  and  15.  A brief 
summary  of  relevant  facts  is  given  in  [4  ],  Chapter  2.  Dr[0,°°)  is  endowed  with 

the  usual  C 1 7]  , Section  14)  Skorokhod  topology,  with  which  it  is  a complete  separ- 
able metric  space.  Convergence  in  Dr[0,°°)  occurs  if,  for  some  sequence  T ■*  °°,  it 
occurs  (for  the  truncated  functions)  in  each  Dr[0,T]. 

Assumptions . (Al)  - (A5)  will  be  used  throughout  the  paper. 

(Al)  X 0 w.p.  1 
n 

(A2)  h(-,*)  is  a Borel  function,  continuously  differentiable  in  its  first 

argument  for  each  value  of  the  second,  and  the  gradient  h^( • , • ) is  Borel. 

Also  ^h(0,£  ) = 0 and 
1 

|[h  (e+t(X  -e),£  )-h  (0,£  )]dt  -+  0 w.p.l 
j x n n x n 

0 

as  n -v  oo.  (Certainly  true  if  the  £n  are  bounded  and  hx(*,*)  is  continuous.) 
(A 3a)  There  is  a matrix  H such  that  for  some  (hence  each)  T > 0 and  each  e > 0 

m( jT+t)-l 

lim  P{sup  max  | £ At.(h  (0,£.)-H)|  >^e}  = 0. 

n-*»  j_>n  0<_t<T  i=m(jT)  1 ^ 

(A 3b)  There  is  a constant  t such  that  for  each  e > 0 and  T > 0, 

m( jT+t  )-l 

iim  P{sup  max  | £ At.(|h  (0,£. ) | - t)|^e}=0, 

n-x»  j_>n  0<t<T  i=m(jT)  1 X 

I/O 

where  |x|  = (x'x)  and  | M | = sup | x | _ j I Mx | if  M is  a matrix. 


Remark  on  (A3a  and  b).  Conditions  of  type  (A3a,  A3b)  were  used  extensively 
in  the  monograph  [ 4 ],  and  as  shown  in  that  reference  are  rather  weak  and  quite 
natural  for  the  problem.  See,  for  example,  the  several  cases  discussed  in  [ ], 

Chapter  2.2.  The  conditions  are  commonly  satisfied  by  the  noise  processes  which 
appear  in  the  usual  applications  to  the  identification  problem.  We  mention  only 

2 

the  following  three  cases  for  (A3a):  (a)  £a  < " and  {h  (0,£  )-Eh  (0,£  )}  ortho- 

n x n x n 

00 

gonal;  (b)  h (9,£  )-Eh  (0,£  ) = Y b.tl  for  a broad  class  of  {b . } , {ijj.}  where 
° ’ x n x n . „ i n-i  3 3 

3=0  J J 

{^.}  are  independent  and  identically  distributed;  (c)  {£n>  stationary,  (A5)  holds 

for  h^  replacing  h and  Ya^(log2i)  < 00  holds. 

In  order  to  illustrate  our  terminology  and  get  some  additional  insight  into 

(A3),  let  us  define  a process  n(t)  as  follows:  n(0)  = 0,  and  n(t)  = 

yn  ^ At.(h  (0,£. )-H)  on  [t  ,t  ).  Then 
Li=0  i x i n n+1 

m( t )- 1 

n(t)  = Y At. (h  (0,£.)-H). 
i=0  1 x 1 

Condition  (A 3a)  implies  that  the  variation  of  the  "increasing  compressed  interpola- 
tion" n(t)  over  an  arbitrary  interval  (a,a+T)  goes  to  zero  w.p.  1 as  a -*■  “. 

(A4)  If_  a = 1,  set  H = AH  + 1/2,  and  if  a < 1,  set  H = AH.  The  eigenvalues 
of  H have  negative  real  parts. 

(A5)  Define  R . by  R , = Eh’(6,C  )h(6,£.  ).  Then  sup  Y,  ^IR  , I < Also 
mk  J mk  ’ m ’’k  rm‘-k=0 1 mk 1 — 

sup  E I h (0,C  )|2  < 00 . 

^m  1 x ’ m 1 


3.  Tightness  of  {U^} 

In  order  to  simplify  the  presentation  of  the  chain  of  calculations,  we 
present  them  partially  in  a sequence  of  lemmas.  Among  other  things,  we  wish  to 
show  that  the  H and  h in  (2.2)  can  be  replaced  by  H and  h^,  resp.  Apart  from 
differences  due  to  the  greater  generality  of  the  noise  here,  the  main  differences 


1 w ' ^ 


between  the  treatment  of  (1.1)  and  the  past  work  where  h was  assumed  additive  in 

£ are  due  to  the  randomness  of  the  To  deal  with  them,  we  exploit  the 

"averaging"  or  "smoothing"  conditions  (A3)  and  the  stability  condition  (A4). 

We  use  K to  denote  a constant  whose  value  may  change  from  usage  to  usage. 

Henceforth  (t^}  denotes  a sequence  of  positive  real  numbers  such  that 

< ”•  Let  be  a sequence  of  integers  tending  to  “ as  k + and  define 

the  measurable  sets  (in  the  sample  space)  A , B and  C by  (note  that  je  > t 

k k k k — M, 

k 

and  m(je^)  _>  are  equivalent  statements) 


m(  jek+t)-l 

= { sup 

max 

1 l - 

jek-\ 

k 

0<t<E. 
k 

i=m( jek) 

m(  jek+t)-l 

= { sup 

max 

1 l < 

jEk±\ 

k 

0<t<£, 
k 

i=m( jek) 

At.(  |hx(9,Ci)|-x)|  _>  ekh 


-L 

C,  = sup  | |[h  (0+t(X  -6),£.)-h  (0,£.)]dt|  > eh. 
k.  jx  j 1 x j -K 


CO 

Set  = y (A.  U B^IJ  C.).  Choose  such  that  P{Ak>  + P{Bk)  + 1.  an^ 

2 

At^  <_  i _>  M^.  Such  a choice  is  possible  by  (A3).  Then  P{Dk)  = wk  0 as 
k -*•  <=°.  Consequently  for  u £ and  i >_  , (A3)  implies  that  the  individual  terms 

in  the  sums  in  (A3)  satisfy 


|Ati(Ahx(0,£.)-AH)|  <_  4|A|ek, 

|Ati( |hx(e, q)|-x)|  1 4ek  . 

From  the  definitions  of  and  we  immediately  get  the  following  lemma. 

Lemma  1.  Under  (Al)  - (A3),  there  is  a constant  K such  that  for  each  k and  ug  L>k 
and  j > 


m<Oek+£k)-1 

1 AtjHl^Ke  , 
i=m( jek) 


n>(jek+t)-l 

I l 

i=m( jek) 


At . (H  .-H)  | < Ke.  , t < e,  . 
11  1 — k — k 


We  now  proceed  to  put  the  iteration  (2.2)  into  a more  convenient  form. 

N N N N 

Define  C by  C = I and  for  n < N,  C = II  (I+At.  H.)  = (I+At..H.r)  •••  ( I+At  H ), 
nN+i  — n.  1 i NN  nn 

-i  — r»  J J 


Lemma  2.  Assume  (Al)  to  (A3).  Then  on  a set  whose  probability  is  arbitrarily 


close  to  1 


(3.1) 


m(t,^+t+s) 

Cm(t^s)  * •'“P  Ht 


as  N uniformly  on  bounded  t- intervals . Also,  there  is  a real  K such  that 


for  each  k and  each  N _>  Mk  and  u>  (£  Dk  and  t <_  ck 


(3.2) 


m(t„+t+s ) 

Cm(t  +s ) = CI  + Ht  + 0]’ 

N 


where  |o  | <_  Kek. 


Proof.  (3.1)  follows  directly  from  (3.2)  and  we  only  prove  (3.2)  for  t <_  sk  and 


s=0.  For  H > m we  have 


M M 


CM  = n(I  + At.H. ) = I + T At .H.  + 7 7 At.  At.  H.  H. 

m ii  .L  i l . *•  . *■.  l,  i-  i,  i 


i2=m  i1>i2  1 212 


+ At  . . . At  H ...  H . 
M m M m 


M M MM 

(3.3)  |C  - (1+  I At.H.  ) I < [ J"  At.  At.  |H.  | |H.  |+  ...  + Atu. . . At  | HM  | | | 

1 m .L  i i 1 — ,L  . “.  l,  i ' l ''  i-'  M m1  M1  1 m‘ 


i2=m  1i>i2 


9. 


Theorem  1.  Under  (Al)  _to  (A5),  { U^}  is  tight  on  R 


Proof.  By  iterating  (2.2)  we  get 


(3.5) 


Define 


J = C*”niJ  + y CN+n  A6W 

JN+n+l  CN  UN  Nt£+1  A n+Jl* 


W = 6W.  + ...  + 6W  , 

1 1 m 


W.  = 6W.  + ...  + 6W  . 

j ] rn 


Then  a summation  by  parts  of  (3.5)  yields 


(3.6) 


_ N+n  N+n.rjN+n  _ r «+n 

UN+n+l  ' CN  UN  CN+1AWN  ^Sl+i+l  ‘WAWN+£AtN+** 


The  estimate  (3.4)  will  now  be  used  heavily.  By  dividing  the  interval 
[tu’tw+n+l^  ^nto  sutiintervals  of  length  ek  (except  for  the  last  subinterval,  which 
is  <_  e^)  and  using  (3.4),  we  get  that  there  is  a sequence  of  real  numbers  6^  -*•  0 
such  that  if  oi  (£  and  N _>  M,  , then 


(3.7) 


N+n+1 ' P 


U+6k)  exp  jj  f(tN+n+1-tN)J-  1^1, 
t d+«k)  exp 


+ (l+6k) 


exp["  Wrw1'AVJwOp- 


Henceforth,  purely  lor  notational  convenience,  we  suppose  tha  I the  6W.  are 

scalar-valued.  In  general,  we  need  only  work  with  one  component  at  a time  anyway. 

M 2 

Proceeding,  let  us  next  evaluate  L'lwml  : 


(3.8) 


E I WM I 2 = E 


i,]=m 


t.  h.h . < 2 J 

HIT—  .L 

J J i — n 


Eh.h. 

i y 


M M 

<2  l /Kt7  l /aT7  | R . - I < 2K  y At.  = 2K(t  .-t  ), 

— . l -]  l]  — ,L  l M+l  m 

i=m  j>i  J i=m 


where  the  last  inequality  follows  by  the  first  half  of  (A5).  With  perhaps  a dif- 
ferent K,  the  same  inequality  holds  for  E|w^|^.  By  this  estimate  and  the  second 
half  of  (A5),  there  is  a constant  such  that  for  N 

(3.9)  ElHN+!lAWN+«.lp  I{a)4Dk}  - Kk(tN+n+l_tN+)l) 

Inequality  (3.7)  holds  with  probability  1 - PtD^}  = * 1*  Let  us  modify 

the  (U.,H.,i>M,  } on  D,  in  a way  such  that  (3.7)  holds  for  all  n and  (3.9)  holds 
i i — k k 

k k 

without  the  indicator  function  and  where  does  not  depend  on  k.  Let  {U.,H^}  denote 

the  altered  sequence.  Then  (3.7)  and  (3.9)  together  imply  that  sup.>M  E|l£|2  < «. 

— k 

Thus  the  sequence  {U.,i<M  ; U^,i>M  } is  tight  on  Rr.  Since  k is  arbitrary  and 

i k i k 

p,  + 1 as  k + ®,  this  implies  that  the  original  {U.}  sequence  is  tight.  Q.E.D. 
k * 


4.  Weak  Convergence  of  {U  (•)}  and  the  Rate  of  Convergence 

N • r 

In  this  section,  we  show  that  (U  ( • )}  converges  weakly  in  D [0,°°)  to  the 
stationary  solution  to  the  Gauss-Markov  diffusion 


(4.1) 


dU  = HUdt  t AR1/2dB, 


1/2  . 


where  B(*)  is  a standard  Wiener  process  and  R is  a square  root  of  the  matrix 

(J  /o 

R in  (A6)  below.  In  particular,  this  implies  that  (Xn~0)(n+1)  converges  in 
distribution  to  a normal  random  variable  with  mean  0 and  covariance 


• • • „.v  ■ ■ « 
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j(exp  Ht )ARA ' ( exp  H't)dt. 


We  will  require  the  following  additional  assumptions. 


(A6)  {h  j } is  a stationary  sequence,  and  E|h^.|b  < °°.  Define  R(i)  = Ehjh*+1« 

Then  R = £ MR(i)  is  bounded  by  (A8). 

Eet  = J&(h  , £<_j ) and  let  E.  denote  the  expectation  conditional  on  <5?., 

J X J j 

(A7)  Define  p ( i ) by 

p,(i)  = sup  E1/2lE.h.  .h'  . -ROO  |2. 

1 1 ] 3 + 1 3 + i+Jl 


Then  J\p^2(i)  < 00 • 


The  sup.  above  and  sup,  below  are  redundant  if  we  assume  that  the  {h.} 
J K 3 

process  started  at  j = -°°,  and  choose  the  sample  space  appropriately. 


(A8)  Define  pj  i)  by  p2(i)  = supkEx/ z | Ekhk+.  | ^ . Then  ^p^Ci)  < ». 


We  now  give  some  examples  of  (A7)  and  (A8).  First  suppose  that  {h.. } is  a 
stationary  and  bounded  <f>-mixing  process  in  the  sense  of  [7,  p.  166],  with  of  course 
Ehj  = 0.  Let  K denote  an  arbitrary  constant.  By  [8,  Lemma  1],  |E..  h..+k  | <_  K$k 

and  lEjhj+khj+k+rRU)l  - K<t>k-  Thus  pi(i)  - K*i>  p2(i)  - K*i • If  K "» 

then  (A7)  and  (A8)  hold.  However,  if  the  h^  are  uounded  and  ^-mixing,  then 

a slightly  different  proof  of  Theorem  2 can  be  given,  requiring  only  £ < «. 


An  example  of  (A6)  ^o  (A8).  Let  Q denote  a matrix  whose  eigenvalues  are 
inside  the  unit  circle,  let  {*n>  denote  a sequence  of  independent  and  identically 
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distributed  Gaussian  random  variables  and  define  £ , 00  > n > by  Cn+^  = 

QC  + Then  {f;^}  is  a stationary  sequence.  Let  Eh(0,£j)  = EIk  = 0 and  suppose 

that  h( • ) = h(0,«)  satisfies  a uniform  Lipschitz  condition,  with  constant  K^.  Let 
^ measure  ij>.,  i j • 

Let  us  evaluate  E | E^h(  ) j . Let  denote  a sequence  with  the  same 

distribution  as  {0.},  but  independent  of  it.  We  have 

tk.i  = 0\  * T 

which  has  the  same  distribution  as 


I Q *£  - l Q ^ + Q\ 

£=0  * A=i 

Using  the  fact  that  the  first  term  above  has  the  same  distribution  as  has  for 
any  m,  together  with  the  Lipschitz  condition,  yields 

|E  [h(first  term  - [ Q i^+Q1?  ) - Eh(first  term)  j L ]|  <_  K E | £ |+K  |. 

Jl=l  K 1 *■  l k 

from  which  (A8)  follows.  A similar  (and  omitted)  calculation  yields  (A7). 

N 

Theorem  2.  Under  (Al)  - (A8),  (U  (•)}  converges  weakly  to  the  stationary  solu- 
tion to  (4.1). 

N 

Part  1.  Define  the  "approximation  to  a Wiener  process"  W (•)  by 
m(t„+t)-l  m(tH+t)-l 

«"(.)  = w„  N - "l  AtTh., 

i=N 

with  a similar  definition  for  W*(  •)  (but  using  6W.  in  lieu  of  6W.).  We  will 
show  that  {W  (•)}  is  tight  in  D [0,«>)  and  converges  li  a Wiener  process  with 
covariance  matrix  Rt.  It  easily  follows  from  this  that  the  same  result  must 
hold  for  - ) } , since  (n+2/n+l)“/2  = 1 + 0(^-)  implies  that  { | WW ( • )-VT(  • ) | } 

♦ends  weakly  to  the  zero  process. 


■ 


■'  mt 
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First  we  prove  tightness  of  {W  (•)}.  For  notational  convenience  only,  we 


assume  that  the  h..  are  scalar-va Lued  in  this  part  of  the  proof.  Otherwise,  we 
would  work  with  one  component  at  a time  anyway,  so  there  is  no  loss  of  generality. 
Let  Z > k > j > i.  We  have 


(4 


2>  lEhihA\i  i lEhihAhi  - n‘ihJEhk1,(l  * lEhihji  lEhkhti 


The  first  term  on  the  right  satisfies  (use  (A7)) 


lEhlVEjV*l  - EW  iEl/2|hil'j|2El/2|EjhA  - Ehkh*F  i KOi(k-j), 


By  (A8),  the  first  term  on  the  right  of  (4.2)  is  bounded  above  by 


|Eh.h.hkEkhe|  + |Eh.h.EhkEkhJ  < E1/2|h.h.hk|^E±/^|EkhJZ 

t |Eh.h.|E1/2hk2E1/2|EkhJ 


12.1/2, 


^ Kp?U-k). 


Thus 


(4.3)  |Eh.h..hkhJ  < Kp]/2(k-j)P2/2(H-k)  + | R(j-i  ) | | R(  £-k  ) | . 


Using  these  bounds,  we  get 


E|wN(t+s)-wH(t)|H  = E|  l /Kt7  h. 


m(tN+t+s)-l 


i=m(tN+t) 


< K l (At  At  At  At  )1/2|Eh  h h h | 

~ i< j<k<£  1 D k 1 ] K 


(summation  between  mO^+t)  and  m(tN+t+s)-l;  at  each  use  of  K it  may  have  a 
different  value) 


I I 




■ »>, 

;:tt;  V 
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f ' 


> 1 


< K l (At. At. At.  At,  )1/2[Py2(k-i)P;^(«.-k)  + | R(  j — i ) | • |R(A-k)  | ] 
“ i<i<k<8,  1 1 K 1 1 2 


1/2, 


i<  j <k<  e. 


(sum  over  l and  use  At^  _>  At^) 


< K l (At. At.)  l/?At  Cp  .L/  ^(k- j ) + | R(  j — i ) | 3 
i<j<k  1 11  k 1 

(sum  over  j and  use  At.  _>  At^) 

(4.3) 


1/2, 


< K Y At. At,  < Ks'' 
. , 1 K 

i<k 


where  the  last  inequality  holds  if  t^+t+s  and  tN+t  take  values  in  the  set  {t^}. 


If  (4.3)  holds  for  all  t,  s,  N,  then  [ 7],  Theorems  15.5  and  12.3  imply 


that  {W  (■)}  is  tight  in  Dr[0,°°)  and  that  all  processes  which  are  weak  limits  have 


continuous  paths  w.p.  1.  But,  since  At  ■+  0 and  the  paths  are  piecewise  constant, 


it  is  enough  that  (4.3)  hold  for  t^+t+s  and  t^+t  in  the  { } set.  Thus  {W  (•)} 


is  tight  and  all  limit  processes  have  continuous  paths  w.p.  1. 


Part  2.  Now,  the  h^  are  treated  as  vectors  rather  than  scalars.  Let  N 


index  a weakly  convergent  subsequence  of  {W  (•)}  and  denote  the  (continuous  w.p.  1) 


weak  limit  by  W(*).  Note  that  (4.3)  implies  that  {|w^(,)|2}  is  uniformly  inte- 


grable.  Let  s..  <_  t <_  t+s  and  q be  arbitrary.  Let  g(  • ) denote  a bounded  continuous 
function  of  WN(s.),  i q,  and  let  denote  expectation  conditioned 


on 


{hj , j<m(tN+t)-l}.  Then 


Eg(WN(s^),  i<q)[WN(t+s)-WN(t)] 


i=m(tN+t+s)-l 

= Eg(WN(s.),  i<q)E*J  l /Kt7  h. 

i=m(tN+t) 


goes  to  zero  as  N + <»  by  (A8).  This  together  with  the  uniform  integrabi  1 i ty  and 
weak  convergence  imply  that  Eg(Vl(s^),  i<q  )[W(  t+s  )-W(  t ) 1 = 0 for  all  q,  bounded 
continuous  g and  { s.  } <_  t t+s.  Thus  W(  • ) is  a continuous  martingale.  To 


... 


..  . . i 
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N N 

compute  its  quadratic  variation,  repeat  the  above  argument  with  [W  (t+s)-W  (t)] 
[WN(t+s)-WN(t)]’  replacing  [WN( t+s)-WN(t)].  Using  (A6),  the  weak  convergence 
and  uniform  integrability  yields 

Eg(WN(s.),  i<_q)[WN(t+s)-WN(t)][WN(t+s)-WN(t)]'  -EgMsJ,  i^q)  Rs . 

Then  the  arbi trariness  of  g and  <_  t t+s  yield  that  the  quadratic  variation 

(at  s)  is  Rs.  Thus  W( • ) is  a Wiener  process  with  covariance  Rs,  as  asserted. 


This  result  does  not  depend  on  the  chosen  convergent  subsequence. 

m(t  +t+s)-l 

Part  3.  Define  the  function  Cn(t,t+s)  = C . . . . Def 

m(t..+t) 

N N N 

H (*)  with  values  I!  = H„  in  [t,T  -t.,,t  ,,  -t„). 

t N+n  N+n  N*  n+N+1  N 


Define  a function 


Then  for  t £ {t„  . -t„ , i>0},  and  modulo  a factor  for  each  term  which  goes 
N+l  N — 

to  zero  uniformly  intw.p.  1 as  N ><=°,  the  sum  (3.6)  can  be  written  in  the  integral 
form  (since  the  integrand  is  constant  over  At^  intervals) 


(4.4)  UN(t)  = CN(0,t)UN(0)  + CN(0,t)AWN(t)  - CN(s,t)H^ 


A[WN(t)-WN(s)]ds. 


for  t > 0,  between  the  {t^},  the  integral  in  (4.4)  is  just  a linear  interpolation 

instead  of  a piecewise  constant  interpolation  of  the  sum  in  (3.6),  and  we  may  work 

N N m(t  +t)-l  , 

with  it  instead.  Define  H (•)  by  U (t)  - J.  , . H.At..  By  ( A3) , (H  ( • ) } is 

J ^i=m(tN)  i l J 

jn  • • - - 

tight  in  D [0,°°)  and  all  limits  are  the  constant  process  with  value  Ht  at  t.  Note 
N * Q 

that  {C  (0,t)}  is  tight  on  D^[0,“>)  for  an  appropriate  integer  q,  since  it  con- 
verges to  exp  Ht  uniformly  on  bounded  intervals  w.p.  1. 

N 

We  now  have  essentially  all  the  limits  that  are  required.  If  Hg  converged 

— — N 

to  the  constant  H w.p.  1 as  N + »,  then  the  weak  convergence  of  W (•)  and  conver- 

N 

gence  of  C (s,t)  would  imply  that  (4.4)  holds  with  all  functions  replaced  by  their 
limits  (and  a weakly  convergent  subsequence  of  {U^( 0)} taken) . Since  does  not 
usually  converge  in  the  above  sense,  a slightly  indirect  method  must  be  used  to 


•w 
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allow  us  to  make  the  replacements  suggested  above.  It  is  convenient  to  have  all 
the  random  functions  defined  on  the  same  space  and  to  work  with  w.p.  1 rather  than 
with  weak  convergence.  To  do  this  we  apply  the  imbedding  technique  of  Skorokhod 
[9],  Theorem  3.1.1.  The  family  {UN(0) , HN(  • ) ,#**(  • ) ,CN(0,t) } = (<t>V)}  is  tight  in 
the  appropriate  space  Rr  * D2r+<^[0,o>)  =0#  and  all  limit  functions  are  continuous 
w.p.  1.  Extract  a convergent  subsequence,  index  it  by  N,  and  denote  the  limit  by 
(U(0) ,H( • ) ,W( • ) ,C(0, • ) = $(•)•  By  the  Skorokhod  imbedding  method  [9],  Theorem 
3.1.1,  there  exists  a probability  space  (JJ,P,B)  with  random  processes 

(UN(0),HN(-),WN(-),CN(0,-)}  = {*N(-)}  and  (U(  0)  ,H(  • ) ,W(  ■ ) ,C(0 , • ) ) = *(  • ) defined 

„N  N 

on  it,  where  $ (•)  (resp.,  $(•))  has  the  same  distribution  as  $ (•)  (resp., 

~ ~N 

4>(  • ) ) , all  the  processes  in  4>(  • ) have  continuous  paths  and  $ (•)  -+  <t>(  • ) w.p.  1 in 
the  topology  of  Since  the  limit  processes  are  continuous , this  means  uniform 

convergence  on  bounded  intervals.  From  H^('),  we  can  recover  the  random  variables 

~ -N  . . 

HN+i,  i _>  0,  from  which  it  was  constructed,  since  H (•)  is  also  piecewise  constant 

w.p.  1.  Also  {hn+£»  i^O}  has  the  same  distribution  as  has  {HN+^,i>0}. 

We  work  with  the  imbedded  processes,  but  drop  the  tilde  affix.  Now , 
return  to  (4.4)  and,  via  the  imbedding,  suppose  that  all  weak  convergences  are 
w.p.  1 in  the  above-cited  topology.  The  first  two  terms  of  (4.4)  converge  to 
(exp  Ht)  U(0)  and  (exp  Ht)  W(t),  resp.  Note  that  CN(s,t)  = C^(0,t)[CN(0,s)]  1 
also  converges  w.p.  1 uniformly  on  bounded  sets  to  exp  H(t-s).  We  next  write  the 
-integral  in  (4.4)  in  a more  convenient  way. 


Let  A > 0,  and  let  M = max{i:  iA<_t}.  We  have 
M-l 

l | {CN(s,t)H%[WN(t)-WN(s)]  - C(iA,t)H^A(WCt)-W(iA)]}ds 

i=0  s s 

lA 

t 

+ | CN(s,t)Hg  A[WN(t)-WN(s)]  - C(M  ,t)H^A(W(t)-W(MA)] }ds 
M A 
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< sup  [|CN(s,t)-C(iA,t)|  + |WN(s)-W(iA)|  + |W(t)-W  (t)[] 

— i=0  iA<s<iA+A 

iA+A 

. [|/(t)-WN(s)|  * |C(s,t)|:  | |H>slA|  P^4S“1d1^reSSl0n 

iA  

By  the  w.p.  1 uniform  convergences  (on  bounded  intervals)  and  continuity 
of  the  limit  functions  and  the  estimate  (A3b),  the  limit  of  the  above  expression 
goes  to  zero  uniformly  on  bounded  t sets,  w.p.  1,  as  N -*■  « and  then  A -*•  0. 

Thus,  we  need  only  examine  the  limits  of 

m i iA+A  t 

M-l  r N 

(4.5)  l C(iA,t)H  A[W(t)-W(iA)]ds  + 

1=0  iA  MA 


| C(MA,t)H^A[W(t)-W(MA)]ds. 


But,  by  (A3a) , (4.5)  converges  to  the  same  expression  with  H replacing  Hg,  uni- 
formly on  bounded  intervals,  w.p.  1 as  N + •.  By  the  above  calculations  we  can 
write  the  limit  of  the  third  term  in  (4.4)  as 


(4.6)  - | C(s,t)HA[W(t)-W(s)J 


ds 


for  the  imbedded,  hence  the  original  processes.  Thus  uN(  t)  (the  imbedded  process) 
converges  to 


(4.7) 


U(t)  = C(0,t)U(0)  + C(0,t)A  W(t)  + (4.6) 


uniformly  on  finite  intervals,  w.p.  1.  Consequently  the  original  U^(*)  converges 
weakly  to  the  process  (4.7).  But  (4.7)  is  the  unique  solution  to  (4.1)  with 
initial  condition  U(0).  The  form  is  independent  of  the  selected  convergent  sub- 


sequences . 


Also,  via  an  integration  by  parts. 
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t 

(4.8)  U(t)  = C(0,t)  U(0)  + | C(s,t)  A dWg. 

0 

We  need  only  show  that  U(0)  is  the  "stationary"  initial  condition.  This 

can  be  easily  shown  in  the  following  manner.  The  set  of  all  possible  U(0)  is 

tight  because  {U  } is.  Also  the  weak  limits  of  (U^('))  are  also  weak  limits  of 
n 

the  restrictions  to  T,°°)  of  the  weak  limits  of  (the  functions  are  left-shifted 
m(t  -T)  m(t  -T) 

by  T)  {U  (•))  on  D [0,°°),  since  U (T)  = U„.  But  the  latter  limits 

are  of  the  form  (4.8)  also.  The  restriction  to  [T,<=°)  involves  simply  replacing 
t by  T+t  in  (4.8).  From  this,  the  tightness  of  possible  U(0),  the  arbitrariness 
of  T and  the  fact  that  C(0,t)  = exp  Ht  ■+  0 as  t ■+•  we  get  that  U(0)  must  be 
the  "stationary"  initial  condition.  Q.E.D. 


: 


1 


! 

! 
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